-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: add two point distance function #1512
Conversation
I'm not really a fan of this method. I know it's not meant to be high performance, but all those linalg calls are just slower. I also think we'll need some sort of minimum image function anyway, (ie given a vector and a box, return a vector which is the minimum vector), so I'd rather this function use that. |
Returning a vector isn't difficult. But for correctness tests this implementation might still be useful. I have tested it on a simulation with a cubic box and the algorithm worked perfect. But if someone has a triclinic cell example that would be great. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if you want to merge this, I would be in favor. Main issue is the docs and no tests.
sa = np.dot(ibox, a) | ||
sb = np.dot(ibox, b) | ||
s = sa - sb | ||
s -= np.round(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is np.round()
different from np.rint()
? – the latter would be the canonical function for this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are equivalent
return np.linalg.norm(a - b) | ||
|
||
box = triclinic_vectors(box) | ||
ibox = np.linalg.inv(box) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can explicitly calculate the inverse as in #1475.
This is marginally faster in numpy but might be a lot faster in cython.
import numpy as np
def make_box():
a = [np.random.random(), 0, 0]
b = np.zeros(3)
b[:2] = np.random.random(2)
c = np.random.random(3)
return 20 * np.array([a, b, c])
def ibox(box):
a,b,c = box
return np.array([[1/a[0], 0, 0], [-b[0]/(a[0]*b[1]), 1/b[1], 0], [(b[0]*c[1] - b[1]*c[0])/(a[0]*b[1]*c[2]), -c[1]/(b[1]*c[2]), 1/c[2]]])
Timing:
box = make_box()
%timeit ibox(box)
# 7.33 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.linalg.inv(box)
#9.98 µs ± 247 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Correctness
In [42]: np.dot(box, ibox(box))
Out[42]:
array([[ 1.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ 1.90479818e-18, 1.00000000e+00, 0.00000000e+00],
[ 3.40613054e-17, -7.40127872e-17, 1.00000000e+00]])
(Closed form from sympy...).
from sympy.interactive.printing import init_printing
init_printing(use_unicode=False, wrap_line=False, no_global=True)
a0, b0, b1, c0, c1, c2 = sym.symbols("a0, b0, b1, c0, c1, c2")
h = sym.Matrix([[a0, b0, c0], [0, b1, c1], [0, 0, c2]])
h.inv()
In [35]: h
Out[35]:
[a0 b0 c0]
[ ]
[0 b1 c1]
[ ]
[0 0 c2]
In [37]: h.inv()
Out[37]:
[1 -b0 b0*c1 - b1*c0]
[-- ----- -------------]
[a0 a0*b1 a0*b1*c2 ]
[ ]
[ 1 -c1 ]
[0 -- ----- ]
[ b1 b1*c2 ]
[ ]
[ 1 ]
[0 0 -- ]
[ c2 ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes with cython this can be speed up. But I have to rewrite to use less variables to get the full speedup and not keep calling python functions.
from numba import jit
import numpy as np
@jit
def jit_ibox(box):
inv = np.zeros(box.shape)
inv[0, 0] = 1 / box[0, 0]
inv[1, 0] = -box[1,0] / (box[0,0]*box[1,1])
inv[1, 1] = 1 / box[1,1]
inv[2, 0] = (box[1,0]*box[2,1] - box[1,1]*box[2,0])/(box[0,0]*box[1,1]*box[2,2])
inv[2, 1] = -box[2,1]/(box[1,1]*box[2,2])
inv[2, 2] = 1 / box[2,2]
return inv
note: I tested this with numba because it has similar performance but much shorter test cycles
%timeit jit_ibox(box)
656 ns ± 18.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks promising; what is the performance of np.linalg.inv()
on your machine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sour don't what you report. It might not help much though because of the dot products that are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry autocorrect. My performance of Linalg.inv is comparable to yours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. And yes, you're right, of course, the matrix-vector multiplications are bad (although I don't know if np.dot
uses any optimized BLAS/LAPACK)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does but the python function call as well as the input verification that numpy does is expensive. Maybe we can unpack that more using explicit formulas calculated with sympy.
@@ -208,6 +208,23 @@ def _check_lengths_match(*arrays): | |||
raise ValueError("Input arrays must all be same shape" | |||
"Got {0}".format([a.shape for a in arrays])) | |||
|
|||
|
|||
def distance(a, b, box=None): | |||
"""calc distances between two points. Uses algorithm from Tuckerman |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc string should state that this takes any PBC into account. I would also reference Tuckerman properly (and possibly even write out the math because we would want to use it as a test case for correctness and in this case, one cannot be explicit enough.)
In [40]: tric = mda.Universe(MDAnalysis.tests.datafiles.PSF_TRICLINIC, MDAnalysis.tests.datafiles.DCD_TRICLINIC)
In [41]: tric.dimensions
Out[41]:
array([ 35.44603729, 35.06156158, 34.15850449, 91.32802582,
61.7352066 , 44.4070282 ], dtype=float32) |
@orbeckst do you also have a testcase that doesn't require our test data? I would like to be able to place a bead as pure numpy arrays and calculate the minimal image by hand. These super basic methods shouldn't depend on our test files |
@kain88-de You can copy the box and the coordinates of two points from one frame of |
I would like to have points for cell types that I can also check by hand. If you want to provide some I'm happy so use them. I think it will take me some time to generate them myself. |
See #1475 (comment) |
Fixes #1262
Changes made in this Pull Request:
distance
function for two atom positions.TODO
a
,b
box
including the format that is used.PR Checklist